Method for performing single instruction multiple data operations on packed data

ABSTRACT

Instructions for performing SIMD instructions, including parallel absolute value and parallel conditional move instructions, as well as a method and circuit for saturating results of operations. The parallel absolute value instruction determines the absolute value of operands based on the sign bit of the operands. When a parallel conditional move instruction is executed, status indicators corresponding to an operand are compared to a condition code in a register to determine whether the condition is true for any of the status indicators; if the condition is true, the corresponding operand is moved to a specified register. A method and circuit for handling saturation of a result of an operation are also provided. When two m-bit operands are added, as in an addition, average, or subtraction operation, if an average instruction is executed, the m most significant bits are output; otherwise, the m least significant bits are output and the result is saturated if there is overflow and saturation is enabled.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of provisional United States patent application entitled “Digital Signal Coprocessor,” application No. 60/492,060, filed on Jul. 31, 2003.

FIELD OF THE INVENTION

This invention relates to single instruction multiple data (“SIMD”) operations on packed data in a processor, particularly instructions causing a processor to determine an absolute value or perform a conditional move of operands or where the result may be saturated.

BACKGROUND ART

Single instruction, multiple data (“SIMD”) style processing has been used to accelerate multimedia processing, including image processing and data compression. Instruction sets for processors often include SIMD instructions where multiple data elements are packed in a single wide register, with the individual data elements operated on in parallel. Using this approach, multiple operations can be performed with one instruction, thus improving performance. One example is INTEL's MMX (multimedia extension) instruction set.

It would be advantageous to provide new SIMD instructions and supporting circuitry to further enhance multimedia processing, for instance, image segmentation or clipping.

SUMMARY OF THE INVENTION

SIMD instructions, including parallel absolute value and parallel conditional move, for parallel processing of packed data are provided as well as a circuit for saturating the result of an operation. Other operations in the instruction set include parallel add, parallel subtract, parallel compare, parallel maximum, and parallel minimum. The operations indicated by the instructions are carried out in the arithmetic logic unit (“ALU”) of a processor.

An instruction indicates, among other things, the operation and the data, in the form of a data word containing data elements, on which the operation is performed. Each data word contains several elements; the number of elements is determined by the mode of operation indicated by the instruction. For instance, when an 8-bit mode is specified, a 32-bit data word contains 4 8-bit data elements, or operands, while in 16-bit mode, the same 32-bit data word contains 2 16-bit operands.

A parallel status flags (“PSF”) register stores the parallel status flags (PSFs) which monitor the status of data elements in data word. PSFs indicate whether the result of an integer operation is zero, the sign of the result of an integer operation, whether there was a carry out from the ALU operation, and whether there was a 2's complement integer overflow result. The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed.

A parallel conditional test (“PTEST”) register contains a code which maps to a test condition. During parallel conditional move (“PCMOV”) instructions, status flags in the PSF register are compared to the test condition in the PTEST register and, if the flags and condition match, the suboperand corresponding to the flags in the PSF register is moved to a specified register.

During parallel absolute value (“PABS”) instructions, the processor determines the absolute value of at least two operands and places the absolute value of the operands in specified registers. The absolute value is determined by using one of the following approaches based on the sign bit of each of the operands: 1) where the sign bit of an operands is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand; 2) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and 3) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand.

A method and circuit for handling saturation of a result of an operation are also provided. When two m-bit operands are added, as in an addition, average, or subtraction operation, if an average instruction is executed, the m most significant bits are output; otherwise, the m least significant bits are output and the result is saturated if there is overflow and saturation is enabled.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a processor that may be used to execute the SIMD instructions of the invention.

FIG. 2 is a block diagram of a processor status word that may be used to indicate an SIMD instruction in accordance with the invention.

FIG. 3 is a block diagram of an SIMD instruction in accordance with the invention.

FIG. 4 is a block diagram of data words used in executing an SIMD instruction in accordance with the invention.

FIG. 5 is a diagram of a saturation circuit used when executing an SIMD instruction in accordance with the invention.

FIG. 6 is a flow chart showing execution of a parallel absolute value instruction in accordance with the invention.

FIG. 7 is a flow chart showing execution of a parallel conditional move instruction in accordance with the invention.

DETAILED DESCRIPTION

FIG. 1 shows a processor 10, or digital signal engine (“DSE”), that may be used to execute SIMD instructions in one embodiment of the invention. Among the features in the DSE 10 are an instruction memory 18, an instruction register, a dual port data memory 14, and an integer SIMD ALU 16, where the SIMD instructions are executed. Instructions are stored in a processor-readable medium, which includes any medium that can store or transfer information; examples of a processor-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, a floppy diskette, a compact disc, an optical disc, etc. Other processors may be used in other embodiments.

In one embodiment, the DSE is controlled by a processor status word (“PSW”) register. In FIG. 2, the PSW 20 is 32 bits long and includes the DSE program counter 22, which holds the address of the next DSE instruction to be executed. For purposes of the invention, other bits of interest include: bit 15, the non-saturation (“NSAT”) bit 58, which when set to “1” indicates the result should not be saturated, and if set to “0” indicates the result should be saturated if necessary; bit 25, the HSIMD bit 34, which when set to “1” indicates that half-word (for instance, if a word is 32 bits long, half a word is 16 bits) operations should be used (in one embodiment, if the HSIMD bit is not set to “1”, 8-bit operations should be used); bit 30, the USIMD bit 36, which when set to “1”, indicates the PADD, PSUB, PAVG, PCMP, PMIN, and PMAX operations use unsigned operands; and bit 31, the SIMD bit 38, which when set to “1” indicates SIMD instructions are to be used (this bit may be employed, for instance, when SIMD instructions are aliased with sum of absolute difference (“SAD”) instructions in one embodiment). The remaining bits 24 are used to control processor operation. The use of the PSW 20 and the assignment of bits is included here as an example; in other embodiments, the use of SIMD instructions may be controlled in other ways.

With respect to FIG. 3, a DSE instruction 40 used in one embodiment is 20 bits long. Six bits indicate the OpCode 42, and 7 bits are used to indicate register addresses (rb and ra) 44 and 46 of operands. In FIG. 4, the data words, Word A 48 and Word B 50, are shown to be 32 bits long 54 in one embodiment (in other embodiments, data words may consist of some other length) and to contain a number of data elements. Depending on the type of operations specified in the PSW, each 32-bit word 38, 50 may consist of two 16-bit words (for instance, bytes D and C 52) or four 8-bit words (for instance, byte A 56).

A parallel status flags (“PSF”) register is part of the DSE. PSFs are used to monitor the status of data elements in data words. The flags are as follows: Zero (“Z”) indicates if the result of an integer operation is zero; Sign (“S”) indicates the sign of the result of an integer operation; Carry (“CY”) indicates there was a carry out from the ALU operation; and Overflow (“OV”) indicates a 2's complement integer overflow result. The register has the following format: Bit Function 31:16 reserved 15:12 PSF3 flags 11:8  PSF2 flags 7:4 PSF1 flags 3 PSF0 OV flag 2 PSFO CY flag 1 PSF0 S flag 0 PSF0 Z flag The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed. In 8-bit mode, computations on byte 0 (the least significant byte) affect PSFO, computations on byte 1 affect PSF1, etc. In 16-bit mode, computations on the lower half-word affect PSF1 while computations on the upper half-word affect PSF3; PSF0 and PSF2 are undefined. Other embodiments of the invention may feature different approaches to handling PSFs.

The DSE also features a parallel condition test (“PTEST”) register. The PTEST register is used when a parallel conditional move (“PCMOV”) instruction is executed. As discussed in greater detail below, a PCMOV operation compares status flags in the PSF register against the test condition specified in the PTEST register; if the flags and the condition match, the suboperand is moved to a specified register. The PTEST register has the following format: Bit Function 31:4 reserved  3:0 condition code

Each 4-bit condition code in the PTEST register maps to a test condition as follows: Compare Code Mnemonic Function Description 0 JMP Move always 1 JCY Move if CY = 1 2 JE Equal Move if zero (Z = 1) 3 JNE Not Equal Move if not zero (Z = 0) 4 JL Less Than Move if negative = (sign XOR overflow) 5 JGE Greater or Move if Equal positive 6 JG Greater Than Move if positive non- zero = not zero AND not (sign XOR overflow) 7 JLE Less or Equal Move if zero = zero OR (sign XOR overflow) 8 JOV Move if overflow (OV = 1) 9 JNOV Move if not overflow (OV = 0) 10 JS Move if sign = 1 (S = 1) 11 JNS Move if sign = 0 (S = 0) 12 reserved 13 JHI Unsigned Move if High Greater Than (CY = 0 and Z = 0) 14 JLS Unsigned Less Move if Lower Than or Equal or Same (CY = 1 OR Z = 1) 15 reserved Other embodiments of the invention may feature different approaches to handling condition codes and the PTEST register.

SIMD instructions may be executed when the DSE is in SIMD mode (in other words, the SIMD bit discussed above is set to “1”). These instructions take 1 cycle to execute. SIMD instructions which may be executed by the processor described above include the following: a parallel absolute value (“PABS”) instruction, which determines the absolute value of an operand and places that value in a specified register; parallel add/subtract (“PADD/PSUB”) instructions that add or subtract operands together and place the results in specified registers; a parallel average (“PAVG”) instruction that averages two values and places the result in a specified register; parallel max/min (“PMAX/PMIN”) instructions that compare two values and write the greater or lesser value into a specified register; a parallel integer compare (“PCMP”) instruction that compares two operands and modifies condition code flags in the parallel status flag register; and a parallel conditional move (“PCMOV”) instruction that compares status flags in the PSW register with the condition code in the PTEST register and, if the flags and code match, moves the operand to a specified register. The instructions and their actions may be summarized as follows: Instruction Action PADD B[i] + A[i] → B[i] PAVG (A[i] + B[i])>>1 → B[i] PSUB B[i] − A[i] → B[i] PABS B[i] = |A[i]| PMIN If B[i] > A[i] then B[i] = A[i] PMAX If B[i] < A[i] then B[i] = A[i] PCMP B[i] − A[i] → PSF[i] PCMOV If PTEST = PSF[i] then A[i] → B[i]

As noted above, when the HSIMD bit in the PSW is set to “1,” 16-bit, or half-word, operations are used; otherwise, 8-bit, or byte, operations are employed. (The remainder of this discussion will address the use of 32-bit data words and 16- or 8-bit operations. This limitation is for explanatory purposes only. Other embodiments may use 64- or 128-bit data words and 32- or 64-bit operations, etc.) When the USIMD bit is set to “1,” PMIN and PMAX use unsigned operands. When the NSAT bit is set to “1,” the result should not be saturated. The following table shows which instructions are affected when certain PSW bits are set: Instruction SIMD USIMD HSIMD NSAT PABS X X PADD X X X X PAVG X X X PCMOV X PCMP X X PMAX X X PMIN X X PSUB X X X X

Sample opcodes for the instruction and updated settings in the PSF register following execution of each instruction are shown below: Instruction Opcode Z S CY OV PADD 111000 X X X x PAVG 111001 X X X 0 PSUB 111010 X X X x PMIN 111100 X X X x PMAX 111101 X X X x PABS 111011 X 0 X x PCMP 111110 X X X x PCMOV 111111 The OV flag is set to zero after execution of a PAVG instruction because there is never overflow when this instruction is executed. The S flag is cleared to 0 after execution of a PABS instruction. Execution of a PCMOV instruction does not affect PSFs. Other embodiments may, of course, use different opcodes to identify each instruction.

The PAVG instruction may be executed in 8- or 16-bit mode and may operate on signed or unsigned data. The USIMD PSW bit determines whether sign-extension is done before adding the operands. If the USIMD bit is set, the operands are zero-padded by one bit. If USIMD is not set, the operands are sign-extended by one bit. In 16-bit mode, the PAVG operation is as follows: rb[31:16]=({(USIMD?0:rb[31]), rb[31:16]}+{(USIMD?0:ra[31]), ra[31:16]})[16:1] (Here, if the USIMD bit is set, the operand is zero-padded by one bit; otherwise the operand is sign-extended (i.e., bit 31 is repeated).) rb[15:0]=({(USIMD?0:rb[15]), rb[15:0]}+{(USIMD?0:ra[15], ra[15:0]})[16:1]

PSFs following execution of a PAVG instruction in 16-bit mode are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:16] == 0) undefined (rb[15:0] == 0) undefined ? 1:0 ? 1:0 S rb[31] undefined rb[15] undefined CY cout[31] undefined Cout[15] undefined OV 0 undefined 0 undefined In 8-bit mode, the PAVG operation is as follows: rb[31:24]=({(USIMD?0:rb[31]), rb[31:24]}+{(USIMD?0:ra[31]), ra[31:24]})[8:1] rb[23:16]=({(USIMD?0:rb[23]), rb[23:16]}+{(USIMD?0:ra[23]), ra[23:16]})[8:1] rb[15:8]=({(USIMD?0:rb[15]), rb[15:8]}+{(USIMD?0:ra[15]), ra[15:8]})[8:1] rb[7:0]=({(USIMD?0:rb[7], rb[7:0]}+{(USIMD?0:ra[7], ra[7:0]})[8:1]

Following execution of the PAVG operation in 8-bit mode, PSFs are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:24] == (rb[23:16 == (rb[15:8] == 0) (rb[7:0 == 0) ? 0) ? 1:0 0) ?1:0 ? 1:0 1:0 S rb[31] rb[23] rb[15] rb[7] CY cout[31] cout[23] cout[15] cout[7] OV 0 0 0 0 “rb” in the tables above refers to the final result of the instruction, not the input operand. The PAVG instruction always rounds down, not towards 0; negative numbers are rounded down towards negative infinity. Execution of the PAVG instruction provides the 8/16 most significant bits (“msbs”) of the result of a 9/17 bits PADD or PSUB operation. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

PADD instructions may be executed in either 16- or 8-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. (When the USIMD bit is “1,” the instructions treat the operands as unsigned operands. When the USIMD bit is “0,” the instructions treat the operands as signed operands.) In 16-bit mode, a PADD instruction operates as follows: rb[31:16]=SATURATE(rb[31:16]+ra[31:16]) (rb and ra are the register addresses) rb[15:0}=SATURATE(rb[15:0]+ra[15:0])

PSFs following execution of a PADD instruction in 16-bit mode are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb [31:16] == 0) undefined (rb [15:0] == 0) undefined ? 1:0 ? 1:0 S rb[31] undefined rb[15] undefined CY Cout[31] undefined cout[15] undefined OV cout[31] XOR undefined cout[15] XOR undefined cout[30] cout[14]

The PADD instruction operates in 8-bit mode as follows: rb[31:24]=SATURATE(rb[31:24]+ra[31:24]) rb[23:16]=SATURATE(rb[23:16]+ra[23:16]) rb[15:8]=SATURATE(rb[15:8]+ra[15:8]) rb[7:0]=SATURATE(rb[7:0]+ra[7:0])

PSFs following an 8-bit operation are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:24] == (rb[23:16] == (rb[15:8] == 0) (rb[7:0] == 0) ? 1:0 0) ?1:0 ? 1:0 0) ? 1:0 S rb[31] rb[23] rb[15] rb[7] CY cout[31] cout[23] cout[15] cout[7] OV cout[31] XOR cout[23] cout[15] XOR cout[7] cout[30] XOR cout[14] XOR cout cout[22] [6] The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

PSUB instructions may also be executed in 8-bit or 16-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. In 16-bit mode, the PSUB instruction operates as follows: rb[31:16]=SATURATE(rb[31:16]−ra[31:16]) rb[15:0}=SATURATE(rb[15:0]−ra[15:0])

PSFs after execution of a PSUB instruction in 8-bit mode are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:16] == 0) undefined (rb[31:16] == 0) undefined 1:0 ? 1:0 S rb[31] undefined rb[15] undefined CY cout[31] undefined cout[15] undefined OV cout[31] XOR undefined cout[15] XOR undefined cout[30] cout[14]

The PSUB instruction operates in 8-bit mode as follows: rb[31:24]=SATURATE(rb[31:24]−ra[31:24]) rb[23:16]=SATURATE(rb[23:16]−ra[23:16]) rb[15:8]=SATURATE(rb[15:8]−ra[15:8]) rb[7:0]=SATURATE(rb[7:00]−ra[7:0])

Following execution of the instruction in 8-bit operation, PSFs are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:24] == (rb[23:16] == (rb[15:8] == 0) (rb[7:0] == 0) ? 1:0 0) ? 1:0 ? 1:0 0) ? 1:0 S rb[31] rb[23] rb[15] rb[7] CY cout[31] cout[23] cout[15] cout[7] OV cout[31] XOR cout[23] cout[15] XOR cout[7] cout[30] XOR cout[14] XOR cout cout[22] [6] The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

Results may be saturated in both 8- and 16-bit mode PADD and PSUB operations (in both signed and unsigned mode). No saturation occurs for PAVG operations, since the average can never overflow, and consequently OV is always 0.

In 16-bit unsigned mode, saturation for the PADD instruction occurs as follows:

-   If ((C==1) && (NSAT==0)) rb[31:16]=0xFFFF     (Here, C represents the current carry value that will be written in     to the PSF register at the end of the instruction.) -   If ((C==1) && (NSAT==0) rb[15:0]=0xFFFF

In 8-bit unsigned mode, saturation for the PADD instruction occurs as follows:

-   If ((C==1) && (NSAT==0)) rb[31:24]=0xFF -   If ((C==1) && (NSAT==0)) rb[23:16]=0xFF -   If ((C==1) && (NSAT==0)) rb[15:8]=0xFF -   If ((C==1) && (NSAT==0)) rb[7:0]=0xFF

In 16-bit unsigned mode, saturation for the PSUB instruction occurs as follows:

-   If ((C==0) && (NSAT==0)) rb[31:16]=0x0000 -   If ((C==0) && (NSAT==0)) rb[15:0]=0x0000

If 8-bit unsigned mode, saturation for the PSUB instruction occurs as follows:

-   If ((C==0) && (NSAT==0) rb [31:24]=0x00 -   If ((C==0) && (NSAT==0) rb [23:16]=0x00 -   If ((C==0) && (NSAT==0) rb[15:8]=0x00 -   If ((C==0) && (NSAT==0) rb[7:0]=0x00

In 16-bit signed mode, saturation occurs as follows:

-   If ((OV==1) && (NSAT==0) && (sum[31]==1)) rb[31:16]=0x7FFF -   If ((OV==1) && (NSAT==0) && (sum[31]==0)) rb[31:16]=0x8000 -   If ((OV==1) && (NSAT==0) && (sum[15]==1)) rb[15:0]=0x7FFF -   If ((OV==1) && (NSAT==0) && (sum[15]==0)) rb[15:0]=0x8000

In 8-bit signed mode, saturation occurs as follows:

-   If ((OV==1) && (NSAT==0) && (sum[31]==1)) rb[31:24]=0x7F -   If ((OV==1) && (NSAT==0) && (sum[31]==0)) rb[31:24]=0x80 -   If ((OV==1) && (NSAT==0) && (sum[23]==1)) rb[23:16]=0x7F -   If ((OV==1) && (NSAT==0) && (sum[23]==0)) rb[23:16]=0x80 -   If ((OV==1) && (NSAT==0) && (sum[15]==1)) rb[15:8]=0x7F -   If ((OV==1) && (NSAT==0) && (sum[15]==0)) rb[15:8]=0x80 -   If ((OV==1) && (NSAT==0) && (sum[7]==1)) rb[7:0]=0x7F -   If ((OV==1) && (NSAT==0) && (sum[7]==0)) rb([7:0]=0x80

If OV is 1, sum[7] is the inverse of cout[7] because OV=cout[6] XOR cout[7]. Also, if OV=1, then sum[7]=cout[6]. Therefore, if OV=1, sum[7] is the inverse of cout[7]. As used here, OV represents the current value that will be written into the PSF register at the end of the current cycle.

In FIG. 5, the circuit 118 for 8-bit PADD, PSUB, and PAVG operations can handle both signed and unsigned operands. When the USIMD bit 160 is set to 1, the operands 164, 172 are treated as unsigned operands; when the USIMD bit 160 is set to 0, the operands 164, 172 are treated as signed operands. Bit 7 166, 174 of each input operand 166, 172 is input into a multiplexer 202, 200. If the USIMD bit 160 is set to 0, bit 7 166, 174 is output from each multiplexer 202, 200 as bit 8 186, 178, which is added to the input operand 164, 174 to produce a 9-bit operand 180, 184 which is input into a 9-bit adder 74. If the SIMD bit 160 is set to 1, a 0 162, 170 is output from each multiplexer 202, 200 as bit 8 186, 178, which is added to the input operand 164, 174 to produce the 9-bit operand 180, 184 which is input into the 9-bit adder 74.

Bits 6 (cout[6] 78) and 7 (cout[7] 76) of the result in the 9-bit adder 74 are input to an XOR gate 80 and the result is sent to a first AND gate 86. The other input to AND gate 86 indicates whether a PAVG instruction 82 is being executed. This input 82 is inverted 84 before it is input to the first AND gate 86. If a PAVG instruction 82 is being executed, the input to the AND gate 86 is 0. If both the inputs to the first AND gate 86 from the inverter 84 and the XOR gate 80 are 1, then the PSV OV flag 110 will be set to 1, indicating an overflow result. When a PAVG instruction 82 is executed, the PSF OV flag is always set to 0.

Cout[7] 76 is also input 208, 76 to two multiplexers 212, 204 (the bit is inverted 206 before being input to one of the multiplexers 212) along with the result 210, 202 from XOR gate 80. If the USIMD bit 60 is 1, the Cout[7] value 208, 76 is output 216, 214 to a three-way multiplexer 218. The output 108 from the three-way multiplexer 218 depends on the operation performed by the circuit—PSUB 216, PADD 214, or PAVG 200 (0 is always output if PAVG is performed). This output 108 represents the current overflow of the operation (and will be discussed further below).

The output 120 (sum[8:0]) from the adder 74 is divided into sum[7:0] 90 and sum[8:1] 88 (the average of the two operands) and sent to a multiplexer 92. If a PAVG instruction 82 is being executed, the multiplexer 92 will output 114 the average, or sum[8:1] 88, to a second multiplexer 100; otherwise, sum[7:0] 90 will be output 114 to the second multiplexer 100.

The other input 198 to the second multiplexer 100 represents saturation values. Cout[7] 76 is input to a third and fourth multiplexer 94, 192. If the value of Cout[7] 76 is 0, 0x7F 98 is output 112 from the third multiplexer 94 to a fifth multiplexer 196, while 0x00 is output 194 from the fourth multiplexer 192 to the fifth multiplexer 196. If Cout[7] 76 is 1, 0x80 95 is output 112 from the third multiplexer 94 to the fifth multiplexer 196 while 0xFF is output 194 from the fourth multiplexer 192 to the fifth multiplexer 196. If the USIMD bit 160 is 0, the output from the third multiplexer 94 is output to the second multiplexer 100; if the USIMD bit 160 is 1, the output from the fourth multiplexer 192 is sent to the second multiplexer 100.

A second AND gate 102 is connected to the second multiplexer 100. The inputs to the AND gate 102 are the output 108 from the three-way multiplexer 218 which indicates whether there is overflow in the current operation and a line 124 indicating whether the result should be saturated (if the NSAT bit 106 is set to 1, the result should not be saturated; if it is set to 0, the result should be saturated. The NSAT bit 106 is inverted 104 and input 124 to the second AND gate 102.). If there is overflow 108 and if the result should be saturated 124, the output 116 (i.e., the result of the operation) from the second multiplexer 100 is the saturated value 198. Otherwise the result 116 is either sum[7:0] 90 for a PADD or PSUB operation or sum[8:1] 88 for a PAVG operation. A circuit for handling operands of different sizes, for instance 16 bits, works on similar principles.

PMIN and PMAX can operate in 8-bit or 16-bit mode with signed or unsigned data depending on the USIMD bit. In 16-bit mode, PMIN and PMAX instructions are executed as follows:

-   rb[31:16]=MIN(rb[31:16],ra[31:16]) -   rb[15:0]=MIN(rb[15:0],ra[15:0]) -   rb[31:16]=MAX(rb[31:16],ra[31:16]) -   rb[15:0]=MAX(rb[15:0],ra[15:0])

PSFs are updated as follows following execution of a PMIN or PMAX instruction in 16-bit mode: PSF3 PSF2 PSF1 PSF0 Z (rb[31:16] == 0) undefined (rb[15:0] == 0) undefined ? 1:0 ? 1:0 S rb[31] undefined rb[15] undefined CY cout[31] undefined cout[15] undefined OV cout[31] XOR undefined cout[15] XOR undefined cout[30] cout[14] In 8-bit mode, PMIN and PMAX instructions are executed as follows:

-   rb[31:24]=MIN(rb[31:24],ra[31:24]) -   rb[23:16]=MIN(rb[23:16],ra[23:16]) -   rb[15:8]=MIN(rb[15:8],ra[15:8]) -   rb[7:0]=MIN(rb[7:0],ra[7:0]) -   rb[31:24]=MAX(rb[31:24],ra[31:24]) -   rb[23:16]=MAX(rb[23:16],ra[23:16]) -   rb[15:8]=MAX(rb[15:8],ra[15:8]) -   rb[7:0]=MAX(rb[7:0],ra[7:0])

Following execution of the PMIN or PMAX instruction in 8-bit mode, the PSFs are as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:24] == (rb[23:16] == (rb[15:8] == 0) (rb[7:0] == 0) ? 1:0 0) ? 1:0 ? 1:0 0) ? 1:0 S rb[31] rb[23] rb[15] rb[7] CY cout[31] cout[23] cout[15] cout[7] OV cout[31] XOR cout[23] cout[15] XOR cout[7] cout[30] XOR cout[14] XOR cout cout[22] [6] In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

The PABS instruction may be executed in either 8- or 16-bit mode depending on the HSIMD PSW bit. The NSAT bit in the PSW does not affect the behavior of the PABS instruction. In 16-bit mode, the PABS instruction is executed as follows:

-   rb[31:16]=ABS(ra[31:16]) -   rb[15:0]=ABS(ra[15:0])

After execution of the PABS instruction in 16-bit mode, the PSFs are updated as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:16] == 0) undefined (rb[15:0] == 0) ? 1:0 undefined ? 1:0 S 0 undefined 0 undefined CY cout[31] undefined cout[15] undefined OV cout[31] XOR undefined cout[15] XOR undefined cout[30] cout[14] In 8-bit mode, the PABS instruction is executed as follows:

-   rb[31:24]=ABS(ra[31:24]) -   rb[23:16]=ABS(ra[23:16]) -   rb[15:8]=ABS(ra[15:8]) -   rb[7:0]=ABS(ra[7:0])

After execution of the PABS instruction in 8-bit mode, the PSFs are updated as follows: PSF3 PSF2 PSF1 PSF0 Z (rb[31:24] == 0) (rb[23:16 == (rb[15:8] == 0) (rb[7:0] == ? 1:0 0) ? 1:0 ? 1:0 0) ? 1:0 S 0 0 0 0 CY Cout[31] cout[23] cout[15] cout[7] OV cout[31] XOR cout[23] cout[15] XOR cout[7] cout[30] XOR cout[14] XOR cout cout[22] [6] In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

The flags tables assume the PABS operation results in 0-ra in the adder. Therefore; overflow will only be set in one case, when the input is 0x80. This is the only instance where the true result of the PABS operation cannot be represented in the required number of bits.

The PABS function behaves as follows as shown in FIG. 6. After the PABS instruction is received (block 60), if the sign bit of the input is 1 (block 62), and all the other bits are 0 (block 64), the result is the 1's complement of the input (block 66). If the sign of the input is 1 (block 62), and the other bits are not all 0 (block 64), the result is the 2's complement of the input (block 70). If the sign bit of the input is 0 (block 62), the result is the input (block 68). For example, in 8-bit mode, ABS(0xFF)=0x01, ABS(0x80)=0x7F, and ABS(0x01)=0x01. In 16-bit mode, ABS(0xFFFF)=0x0001, ABS(0x8000)=0x7FFF, and ABS(0x0FFF)=0x0FFF. Following execution of the instruction, the PSF register is updated (block 72).

The PCMP instruction may be executed in 8- or 16-bit mode on signed or unsigned operands. In executing this instruction, a subtraction is performed without updating the destination register. Instead, the condition code flags in the PSF register are modified. In 16-bit mode, the PCMP operation is as follows:

-   PSF3=CMP(rb[31:16],ra[31:16]) -   PSF1=CMP(rb[15:0],ra[15:0])

Following execution of a PCMP instruction in 16-bit mode, PSFs are updated as follows: PSF3 PSF2 PSF1 PSF0 Z (sum[31:16] == 0) undefined (sum[15:0] == 0) undefined ? 1:0 ? 1:0 S sum[31] undefined sum[15] undefined CY cout[31] undefined cout[15] undefined OV cout[31] XOR undefined cout[15] XOR undefined cout[30] cout[14] In 8-bit mode, the PCMP operation is as follows:

-   PSF3=CMP(rb[31:24],ra[31:24]) -   PSF2=CMP(rb[23:16],ra[23:16]) -   PSF1=CMP(rb[15:8],ra[15:8]) -   PSF0=CMP(rb[7:0],ra[7:0])

The PSF register is updated as follows: PSF3 PSF2 PSF1 PSF0 Z (sum[31:24] == 0) (sum[23:16 == 0) ? (sum[15:08] == 0) (sum[07:00] == 0) ? 1:0 1:0 ? 1:0 ? 1:0 S sum[31] sum[23] sum[15] sum CY cout[31] cout[23] cout[15] cout[7] OV cout[31] XOR cout[23] cout[15] XOR cout[7] XOR cout[30] XOR cout[14] cout [6] cout[22] Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.

PCMOV instructions may be executed in either 16- or 8-bit mode. The instructions test the condition code in the PTEST register (discussed above) against the 4 sets of flags in the PSF register. If the specified condition is true, the corresponding 8 or 16 bits is moved. The PCMOV instruction operates in 16-bit mode as follows:

-   If (PSF3==cnd(PTEST[3:0])) rb[31:16]=ra[31:16] -   If (PSF1==cnd(PTEST[3:0])) rb[15:0]=ra[15:0]     The PCMOV instruction operates in 8-bit mode as follows: -   If (PSF3==cnd(PTEST[3:0])) rb[31:24]=ra[31:24] -   If (PSF2==cnd(PTEST[3:0])) rb[23:16]=ra[23:16] -   If (PSF1==cnd(PTEST[3:0])) rb[15:8]=ra[15:8] -   If (PSF0==cnd(PTEST[3:0])) rb[7:0]=ra[7:0]

To illustrate execution of a PCMOV instruction, in FIG. 7, when a PCMOV instruction is received (block 124), 8- or 16-bit mode is specified. If 16-bit mode is indicated (block 126), the PSF3 and PSF1 flags are tested against the condition code in the PTEST register (blocks 128, 132). If the specified condition is true, the operand (“ra”) associated with the tested PSF is moved to a destination register (“rb”), i.e., ra[31:16] is moved to rb[31:16] (block 130) and ra[15:0] is moved to rb[15:0] (block 134). If a specified condition is not true (blocks 128, 132) or an operand is moved (blocks 130, 134), execution of the instruction is finished (block 152).

If 8-bit mode is specified (block 126), the PSF3, PSF2, PSF1, and PSF0 flags are tested against the condition code in the PTEST register (blocks 136, 140, 144, 148). If the specified condition is true, the operand associated with the tested PSF is moved to a destination register, i.e., ra[31:24] is moved to rb[31:24] (block 138), ra[23:16] is moved to rb[23:16] (block 142), ra[15:8] is moved to rb[15:8] (block 146), and ra[7:0] is moved to rb[7:0] (block 150). If a specified condition is not true (blocks 136, 140, 144, 148) or an operand is moved (blocks 138, 142, 146, 150), execution of the instruction is finished (block 154).

The PCMOV instruction allows decisions on multiple data streams to be made in one cycle, for example, clipping in image processing. Suppose 8×8 mode is specified and the following transformation of each of the 4 8-bit results in register (“R”) 0 is desired:

-   If x<−30 then 0→x -   If −30<=x<=+30 then c→x, where c is some constant -   If 30<x then 255→x     The above may be achieved in 4 cycles, with the result in R1, as     shown below. Suppose -   PTEST=JG -   R1=c, c, c, c -   R2=0, 0, 0, 0 -   R3=−30, −30 , −30, −30 -   R4=30, 30, 30, 30 -   R5=255, 255, 255, 255     The following instructions are issued: -   PCMP R0, R3 -   PCMOV R1, R1 -   PCMP R4, R0 -   PCMOV R5, R1     Note that PCMP x,y does y-x and JG jumps if y>x. 

1. In a processor, a method for performing a parallel conditional move operation comprising: a) comparing at least two sets of status indicators which correspond to at least two operands to a corresponding condition code specified in a register to determine whether the condition indicated by the condition code is true for any of the status indicators; and b) if the condition indicated by the condition code is true for any of the status indicators, moving the corresponding operand to a specified register.
 2. The method of claim 1 wherein each of the operands is obtained from a data word having at least two data elements corresponding to operands.
 3. The method of claim 2 wherein the number of data elements in each data word is identified in an instruction.
 4. A processor-readable storage medium storing an instruction that, when executed by a processor, causes the processor to perform a method for performing a parallel condition move operation, the method comprising: a) comparing at least two sets of status indicators which correspond to at least two operands to a corresponding condition code specified in a register to determine whether the condition indicated by the condition code is true for any of the status indicators; and b) if the condition indicated by the condition code is true for any of the status indicators, moving the corresponding operand to a specified register.
 5. The processor-readable storage medium of claim 4 wherein each of the operands is obtained from a data word having at least two data elements corresponding to operands.
 6. The processor-readable storage medium of claim 5 wherein a number of operands in each data word identified in the instruction is specified in the instruction.
 7. In a processor, a method for performing a parallel absolute operation comprising: a) determining the absolute value of at least two operands by employing one of the following approaches based on the sign bit of each of the at least two operands: i) where the sign bit of an operand is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand; ii) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and iii) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand, wherein overflow is set only when the operand is 0x80; and b) placing the absolute value of each of the at least two operands in at least two registers specified to receive the absolute value of the at least two operands.
 8. The method of claim 7 further comprising updating parallel status flags in each parallel status flags register corresponding to each register specified to receive the absolute value of one of the at least two operands.
 9. The method of claim 7 wherein each of the operands is obtained from a data word having at least two data elements corresponding to operands.
 10. The method of claim 9 wherein the number of data elements in each data word is identified in an instruction.
 11. A processor-readable storage medium storing an instruction that, when executed by a processor, causes the processor to perform a method for performing a parallel absolute operation, the method comprising: a) determining the absolute value of at least two operands by employing one of the following approaches based on the sign bit of each of the at least two operands: i) where the sign bit of an operand is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand; ii) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and iii) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand, wherein overflow is set only when the operand is 0x80; and b) placing the absolute value of each of the at least two operands in at least two registers specified to receive the absolute value of the at least two operands.
 12. The processor-readable storage medium of claim 11 further comprising a parallel absolute instruction that causes a processor to update parallel status flags in each parallel status flags register corresponding to the specified register.
 13. The processor-readable storage medium of claim 11 wherein each of the operands is obtained from a data word having at least two data elements corresponding to operands.
 14. The processor-readable storage medium of claim 13 wherein a number of operands in each data word identified in the instruction is specified in the instruction.
 15. In a processor, a method for saturating a result of a first operation comprising: a) adding together two m-bit operands; b) outputting the m most significant bits as the result when an average operation is performed, otherwise outputting the m least significant bits as the result, wherein the result of the m least significant bits is saturated if there is overflow and if saturation is enabled; and c) placing the result in a specified register.
 16. The method of claim 15 wherein the operands are signed.
 17. The method of claim 15 wherein the operands are unsigned.
 18. The method of claim 15 further comprising setting an overflow flag when adding the two operands together results in an overflow.
 19. The method of claim 15 wherein each of the m-bit operands is obtained from an n-bit data word having at least two m-bit data elements corresponding to m-bit operands.
 20. The method of claim 19 wherein the number of m-bit data elements in each data word is identified in an instruction.
 21. The method of claim 15 wherein at least one additional operation is performed in parallel with the first operation.
 22. The method of claim 15 further comprising updating parallel status flags in each parallel status flags register corresponding to each specified register.
 23. A circuit configured to saturate a result of a first operation comprising: a) an m+1-bit adder for adding together two m-bit operands and outputting an m+1 bit result; b) coupled to the adder, a first multiplexer for outputting one of the following values: i) the m least significant bits output by the adder; or ii) the m most significant bits output by the adder, wherein the m most significant bits is output by the first multiplexer when an average instruction is executed; c) a second multiplexer coupled to a third multiplexer for outputting a selected saturation value; and d) coupled to the adder, the third multiplexer for outputting one of the following values: i) the output from the first multiplexer; or ii) the output from the second multiplexer, wherein the output from the second multiplexer is output by the third multiplexer when there is overflow and saturation is enabled.
 24. The circuit of claim 23 further comprising means for setting an overflow flag, the setting means coupled to the third multiplexer.
 25. The circuit of claim 23 further comprising fifth and sixth multiplexer for outputting saturated values.
 26. The circuit of claim 23 wherein the output of the selected saturation value is determined by whether operands are signed or unsigned.
 27. The circuit of claim 23 wherein the operation is an addition operation.
 28. The circuit of claim 23 wherein the operation is a subtraction operation.
 29. The circuit of claim 23 wherein the operation is an average operation. 